Matching and maximizing are two ends of a spectrum of policy search algorithms January 2 , 2004
نویسنده
چکیده
According to the matching law, when an animal makes many repeated choices between alternatives, its preferences are in the ratio of the incomes derived from the alternatives. Because matching behavior does not maximize reward, it has been difficult to explain using optimal foraging theory or rational choice theory. Here I show that matching and maximizing can be regarded as two ends of a spec trum of policy search algorithms from reinforcement learning. The algorithms are parametrized by the time horizon within which past choices are correlated with present reward. Maximization corresponds to the case of a long time horizon, while matching corresponds to a short horizon. From this viewpoint, matching is an approximation to maximizing, with the advantage of faster learning and more robust performance in nonstationary environments. Between these two ends of the spectrum lie many strategies intermediate between matching and maximizing. If an animal’s relative preferences for alternatives are in the ratio of the incomes derived from them, then its behavior is said to be “matching.” Matching behavior has been observed for certain types of reinforcement schedules, in particular those that randomize the interval between reward. The matching law was important because it gave the law of effect a quantitative formulation. Given the matching law as an empirical observation about behavior, two questions immediately come to mind. The first question is functional: why is matching a good policy for animals to follow? The second is mechanistic: what neural mechanisms un derlie the production of matching behavior? This note mainly addresses the first ques tion, by elucidating the function of matching from the viewpoint of the mathematical theory of reinforcement learning. However, the second question is also peripherally ad dressed through mathematical developments that are shared by recent neural network models of matching behavior. One of the most common ways to explain the function of a behavior is to argue that it has been adapted by evolution to be optimal. Such an explanation for matching has been elusive, because matching does not generally maximize the animal’s overall rate of reward. In this respect, the matching law is suboptimal. This enables it to be used as an explanation for “irrational” human behaviors, such as addiction and other behaviors attributed to lack of “selfcontrol.” Nevertheless, it would be hasty to completely reject optimality as an explanation of matching. Often matching is close to optimal, even if it is not exactly so.
منابع مشابه
Stock Portfolio Optimization Using Water Cycle Algorithm (Comparative Approach)
Portfolio selection process is a subject focused by many researchers. Various criteria involved in this process have undergone alterations over time, necessitating the use of appropriate investment decision support tools. An optimization approach used in different sciences is using meta-heuristic algorithms. In the present study, using Water Cycle Algorithm (WCA), a model was introduced for sel...
متن کاملSolving a Multi-Item Supply Chain Network Problem by Three Meta-heuristic Algorithms
The supply chain network design not only assists organizations production process (e.g.,plan, control and execute a product’s flow) but also ensure what is the growing need for companies in a longterm. This paper develops a three-echelon supply chain network problem including multiple plants, multiple distributors, and multiple retailers with amulti-mode demand satisfaction policy inside of pro...
متن کاملA New RSTB Invariant Image Template Matching Based on Log-Spectrum and Modified ICA
Template matching is a widely used technique in many of image processing and machine vision applications. In this paper we propose a new as well as a fast and reliable template matching algorithm which is invariant to Rotation, Scale, Translation and Brightness (RSTB) changes. For this purpose, we adopt the idea of ring projection transform (RPT) of image. In the proposed algorithm, two novel s...
متن کاملSingle-Setup-Multiple-Deliveries for a Single Supplier-Single Buyer with Single Product and Backorder
This article investigates integrated production-inventory models with backorder. A single supplier and a single buyer are considered and shortage as backorder is allowed for the buyer. The proposed models determine optimal order quantity, optimal backorder quantity and optimal number of deliveries on the joint total cost for both buyer and supplier. Two cases are discussed: single-setup-singl...
متن کاملFORECASTING TRANSPORT ENERGY DEMAND IN IRAN USING META-HEURISTIC ALGORITHMS
This paper presents application of an improved Harmony Search (HS) technique and Charged System Search algorithm (CSS) to estimate transport energy demand in Iran, based on socio-economic indicators. The models are developed in two forms (exponential and linear) and applied to forecast transport energy demand in Iran. These models are developed to estimate the future energy demands based on pop...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004